2026-01-02
The task is formulated as a sequence-to-sequence style transfer problem, where an input sentence in normal Khmer is transformed into its corresponding royal-style sentence.
Let:
Conditional Probability
\[ P(Y \mid X; \theta) = \prod_{t=1}^{T'} P\!\left(y_t \mid y_1, \dots, y_{t-1}, X; \theta\right) \]
Training Objective
\[ \theta^* = \arg\max_{\theta} \sum_{(X,Y)\in\mathcal{D}} \log P(Y \mid X; \theta) \]
Notation
The mechanism begins by calculating how well each encoder state \(h_s^{enc}\) matches the current decoder needs.
Alignment Score: Measures the relevance of input \(s\) at decoding step \(t\): \[e_{t,s} = h_{t-1}^{dec} \cdot h_s^{enc}\]
Attention Weight: Normalizes scores into a probability distribution using Softmax: \[\alpha_{t,s} = \frac{\exp(e_{t,s})}{\sum_{k=1}^{T} \exp(e_{t,k})}\]
The model then extracts relevant information to generate the final character.
Context Vector (\(c_t\)): A weighted sum of all encoder hidden states: \[c_t = \sum_{s=1}^{T} \alpha_{t,s} h_s^{enc}\]
Final Prediction: The decoder hidden state \(h_t^{dec}\) is updated with \(c_t\), and the next character is predicted: \[P(y_t \mid y_{<t}, X) = \text{Softmax}(W_{hy} h_t^{dec} + b_y)\]